Convergent Combinations of Reinforcement Learning with Linear Function Approximation

نویسندگان

  • Ralf Schoknecht
  • Artur Merke
چکیده

Convergence for iterative reinforcement learning algorithms like TD(O) depends on the sampling strategy for the transitions. However, in practical applications it is convenient to take transition data from arbitrary sources without losing convergence. In this paper we investigate the problem of repeated synchronous updates based on a fixed set of transitions. Our main theorem yields sufficient conditions of convergence for combinations of reinforcement learning algorithms and linear function approximation. This allows to analyse if a certain reinforcement learning algorithm and a certain function approximator are compatible. For the combination of the residual gradient algorithm with grid-based linear interpolation we show that there exists a universal constant learning rate such that the iteration converges independently of the concrete transition data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reinforcement Learning with Linear Function Approximation and LQ control Converges

Reinforcement learning is commonly used with function approximation. However, very few positive results are known about the convergence of function approximation based RL control algorithms. In this paper we show that TD(0) and Sarsa(0) with linear function approximation is convergent for a simple class of problems, where the system is linear and the costs are quadratic (the LQ control problem)...

متن کامل

Convergent Tree-Backup and Retrace with Function Approximation

Off-policy learning is key to scaling up reinforcement learning as it allows to learn about a target policy from the experience generated by a different behavior policy. Unfortunately, it has been challenging to combine off-policy learning with function approximation and multi-step bootstrapping in a way that leads to both stable and efficient algorithms. In this paper, we show that the Tree Ba...

متن کامل

A Convergent O(n) Algorithm for Off-policy Temporal-difference Learning with Linear Function Approximation

We introduce the first temporal-difference learning algorithm that is stable with linear function approximation and off-policy training, for any finite Markov decision process, behavior policy, and target policy, and whose complexity scales linearly in the number of parameters. We consider an i.i.d. policy-evaluation setting in which the data need not come from on-policy experience. The gradien...

متن کامل

Applying Policy Iteration for Training Recurrent Neural Networks

Recurrent neural networks are often used for learning time-series data. Based on a few assumptions we model this learning task as a minimization problem of a nonlinear least-squares cost function. The special structure of the cost function allows us to build a connection to reinforcement learning. We exploit this connection and derive a convergent, policy iteration-based algorithm. Furthermore,...

متن کامل

Experimental analysis of eligibility traces strategies in temporal difference learning

Temporal difference (TD) learning is a model-free reinforcement learning technique, which adopts an infinite horizon discount model and uses an incremental learning technique for dynamic programming. The state value function is updated in terms of sample episodes. Utilising eligibility traces is a key mechanism in enhancing the rate of convergence. TD(λ) represents the use of eligibility traces...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002